Search CORE

4 research outputs found

Graph-Based Weakly-Supervised Methods for Information Extraction & Integration

Author: Talukdar Partha P
Publication venue: ScholarlyCommons
Publication date: 01/01/2010
Field of study

The variety and complexity of potentially-related data resources available for querying --- webpages, databases, data warehouses --- has been growing ever more rapidly. There is a growing need to pose integrative queries across multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse sources. This has traditionally been the focus of research within Information Extraction (IE) and Information Integration (II) communities, with IE focusing on converting unstructured sources into structured sources, and II focusing on providing a unified view of diverse structured data sources. However, most of the current IE and II methods, which can potentially be applied to the pro blem of integration across sources, require large amounts of human supervision, often in the form of annotated data. This need for extensive supervision makes existing methods expensive to deploy and difficult to maintain. In this thesis, we develop techniques that generalize from limited human input, via weakly-supervised methods for IE and II. In particular, we argue that graph-based representation of data and learning over such graphs can result in effective and scalable methods for large-scale Information Extraction and Integration. Within IE, we focus on the problem of assigning semantic classes to entities. First we develop a context pattern induction method to extend small initial entity lists of various semantic classes. We also demonstrate that features derived from such extended entity lists can significantly improve performance of state-of-the-art discriminative taggers. The output of pattern-based class-instance extractors is often high-precision and low-recall in nature, which is inadequate for many real world applications. We use Adsorption, a graph based label propagation algorithm, to significantly increase recall of an initial high-precision, low-recall pattern-based extractor by combining evidences from unstructured and structured text corpora. Building on Adsorption, we propose a new label propagation algorithm, Modified Adsorption (MAD), and demonstrate its effectiveness on various real-world datasets. Additionally, we also show how class-instance acquisition performance in the graph-based SSL setting can be improved by incorporating additional semantic constraints available in independently developed knowledge bases. Within Information Integration, we develop a novel system, Q, which draws ideas from machine learning and databases to help a non-expert user construct data-integrating queries based on keywords (across databases) and interactive feedback on answers. We also present an information need-driven strategy for automatically incorporating new sources and their information in Q. We also demonstrate that Q\u27s learning strategy is highly effective in combining the outputs of ``black box\u27\u27 schema matchers and in re-weighting bad alignments. This removes the need to develop an expensive mediated schema which has been necessary for most previous systems

ScholarlyCommons@Penn

Interpretable semantic vectors from a joint model of brain-and text-based meaning

Author: Fyshe Alona
Mitchell Tom M
Murphy Brian
Talukdar Partha P
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2014
Field of study

Vector space models (VSMs) represent word meanings as points in a high dimen-sional space. VSMs are typically created using a large text corpora, and so repre-sent word semantics as observed in text. We present a new algorithm (JNNSE) that can incorporate a measure of semantics not previously used to create VSMs: brain activation data recorded while people read words. The resulting model takes advan-tage of the complementary strengths and weaknesses of corpus and brain activation data to give a more complete representa-tion of semantics. Evaluations show that the model 1) matches a behavioral mea-sure of semantics more closely, 2) can be used to predict corpus data for unseen words and 3) has predictive power that generalizes across brain imaging technolo-gies and across subjects. We believe that the model is thus a more faithful represen-tation of mental vocabularies.

Queen's University Belfast Research Portal

CiteSeerX

Crossref

PubMed Central

FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop

Author: Abhimanu Kumar (5362100)
Alex Beutel (5364041)
Christos Faloutsos (5359838)
Eric P Xing (27712)
Evangelos E. Papalexakis (5364035)
Partha Pratim Talukdar (5364038)
Publication venue
Publication date: 29/06/2018
Field of study

Given multiple data sets of relational data that share a number of dimensions, how can we efficiently decompose our data into the latent factors? Factorization of a single matrix or tensor has attracted much attention, as, e.g., in the Netflix challenge, with users rating movies. However, we often have additional, side, information, like, e.g., demographic data about the users, in the Netflix example above. Incorporating the additional information leads to the coupled factorization problem. So far, it has been solved for relatively small datasets. We provide a distributed, scalable method for decomposing matrices, tensors, and coupled data sets through stochastic gradient descent on a variety of objective functions. We offer the following contributions: (1) Versatility: Our algorithm can perform matrix, tensor, and coupled factorization, with flexible objective functions including the Frobenius norm, Frobenius norm with an ℒ1 induced sparsity, and non-negative factorization. (2) Scalability: FlexiFaCT scales to unprecedented sizes in both the data and model, with up to billions of parameters. FlexiFaCT runs on standard Hadoop. (3) Convergence proofs showing that Flexi-FaCT converges on the variety of objective functions, even with projections.</p

FigShare

Simultaneously Uncovering the Patterns of Brain Regions Involved in Different Story Reading Subprocesses

Author: A Meyler
A Nestor
Aaditya Ramdas
AD Friederici
AG Huth
Alona Fyshe
B Murphy
Brian Murphy
C Pallier
CJ Honey
E Fedorenko
E Fedorenko
E Grossman
G Aguirre
G Kuperberg
Kevin Paterson
L Cohen
Leila Wehbe
M Dapretto
MI Gobbini
N Kriegeskorte
P Hagoort
P Hagoort
Partha Talukdar
R Constable
R Salmelin
R Saxe
SM Ravizza
T Mitchell
Tom Mitchell
U Hasson
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

Crossref